The Basics

What is this project all about?

The simple answer is that you are going to tell a story with data.

But, like I mentioned above, you should be spending AT LEAST 50 hours on this project. So, the story you tell has to be a bit complex.

I think the most important piece is choosing data or a research question that you are passionate about! Or, to reference Marie Kondo, one that sparks joy.

I love cycling and I have data on almost all my bike rides for the past 4 or 5 years. Wouldn’t it be cool to tell a story with those data?

I am also passionate about education. I think a lot about how schools are funded. Because I’ve been heading up this fundraiser at my kids’ school, it also has me thinking a lot about how PTO/PTA/etc. can make a difference in what types of resources schools are able to have. Maybe I could find some data regarding that ….

Where do I find data?

  • Like I mentioned above, FIRST find your passion.
  • Once you decide on a topic area, try to come up with some interesting research questions. For example, to use the bike data: Where do I bike most often? How fast do I go? Do I have a typical pattern in my long rides? Could I add some weather information to the data? … Where would I find that? … How would I add that? ….
  • Look for data. In my bike example, it’s pretty easy. The data are on my Garmin. But how do I get it out of there and into R? That’s a harder question, but I bet a bit of searching on the internet might lead you in the right direction. What if your data is not on a Garmin?
    • Try an internet search first. That might lead you to a good source.
    • Kaggle has a ton of data sets. I would not recommend going there first, though. FIRST, find your passion.
    • TidyTuesday data … but again, FIRST, find your passion.
    • If there is a certain subject area you are interested and you know a professor who studies that subject, you could ask them where to find some good data.
    • If there’s data you want from the internet that is not readily available, you could maybe scrape it from the web. We’ll learn some of these techniques very soon.
    • Come talk to me! If you have your passion, I can help you find the data!
    • Maybe a local small business or nonprofit would have some data you could use … this would be especially good if it is something you are passionate about!

What does my final project look like?

I am giving you quite a bit of flexibility in what your final project looks like, but I see two larger categories. The one requirement is that it is done completely within R Studio, both the final written work and final presentation. Assume your audience consists of people that read pop-statistics and pop-computing blogs. You should assume that this audience is not familiar with your project but is comfortable with the fundamentals of data science

  1. Technical blog post: similar to a paper, but a bit more casual and perhaps featuring more figures than you would typically include in a paper. We could find tons of examples. Here’s just a few:
  1. Shiny App or flexdashboard: type of interactive dashboard thatwe’ll discuss next week. This option involves less writing but likely requires you to learn a little more coding on your own. In addition to creating the shiny app, you will also be required to submit a “User’s manual” that describes how someone would interact with the app. Check out some examples at the Shiny app gallery, show me shiny gallery, and flexdashboard gallery.

Final Presenation

The final presentatations are fairly short, about 12 minutes for each group with 3-4 minutes of questions. It will include the following:

  • Introduction of your topic/research question.
  • Discussion of how you gathered the data. For some of you, this part will be a short discussion. For others, this could be a huge part of your project. You don’t need to tell us every detail about how you cleaned the data, but showing a couple key things you had to do to have success with your data might be very interesting.
  • Highlight the main conclusions you drew. This should be the largest part of your presentation. Those of you going the blog post route should highlight a few graphs and tables. Those of you doing shiny apps or flexdashboards could show that during the presentation.
  • Optional other parts: discuss any serious challenges you faced and how you overcame them, briefly discuss a new function you used that you think others might find useful, talk about what more you would do if you had more time, etc.